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Abstract 

We study the complexity of polynomial multiplication over arbitrary 
fields. We present a unified approach that generalizes all known asymp- 
totically fastest algorithms for this problem. In particular, the well-known 
algorithm for multiplication of polynomials over fields supporting DFTs of 
large smooth orders, Schonhage-Strassen's algorithm over arbitrary fields 
of characteristic different from 2, Schonhage's algorithm over fields of char- 
acteristic 2, and Cantor-Kaltofen's algorithm over arbitrary algebras — all 
appear to be instances of this approach. We also obtain faster algorithms 
for polynomial multiplication over certain fields which do not support 
DFTs of large smooth orders. 

We prove that the Schonhage-Strassen's upper bound cannot be im- 
proved further over the field of rational numbers if we consider only al- 
gorithms based on consecutive applications of DFT, as all known fastest 
algorithms are. We also explore the ways to transfer the recent Fiirer's 
algorithm for integer multiplication to the problem of polynomial multi- 
plication over arbitrary fields of positive characteristic. 

This work is inspired by the recent improvement for the closely re- 
lated problem of complexity of integer multiplication by Fiirer and its 
consequent modular arithmetic treatment due to De, Kurur, Saha, and 
Saptharishi. We explore the barriers in transferring the techniques for 
solutions of one problem to a solution of the other. 



1 Introduction 

Complexity of polynomial multiplication is one of the central problems in com- 
puter algebra and algebraic complexity theory. Given two univariate polynomi- 

*This research is supported by Cluster of Excellence "Multimodal Computing and Inter- 
action" at Saarland University. 
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als by vectors of their coefBcients, 

n— 1 n— 1 

a(a;) = ajx', b{x) = ''^bjX^ , (1) 

i=0 j=0 

over some field k, the goal is to compute the coefficients of their product 

2n-2 2n-2 

c{x) = a{x) ■ b{x) = ^ cex^ = ^ ^ aibjx^. (2) 
e=o e=o o<i,j<n, 

The direct way by the formulas above requires multiplications and (n — 1)^ 
additions of elements of k, making the total c.omplcixity of the naive algorithm 
O(n^). In what follows we call k the ground field. 

1.1 Model Of Computation 

We study the problem of the total algebraic complexity of the multiplication 
of polynomials over fields. That is, elements of k arc thought of as algebraic 
entities, and each binary arithmetic operation on these entities has unit cost. 
This model is rather abstract in the sense, that it counts, for example, an infinite 
precision multiplication of two reals as a unit cost operation. On the other hand, 
it has an advantage of being independent of any concrete implementation that 
may depend on many factors, including human-related, thus it is more universal, 
sec the discussion on this topic in [9, Introduction]. 

We are concerned with the total number of arithmetic operations, i.e. multi- 
plications and additions/subtractions that are sufficient to multiply two degree 
n — 1 polynomials. Since the resulting functions can be computed without divi- 
sions, it seems natural to consider only division-free algebraic algorithms. The 
inputs of such algorithm are the values ao, . . . , a„_i, 6o, . . . , bn-i S k, the out- 
puts are the values cq, ci, . . . , C2n-2 € fc as defined in (1), (2). Any step of 
an algorithm is a multiplication, a division, an addition or a subtraction of two 
values, each being an input, a value, previously computed by the algorithm, or 
a constant from the ground field. An algorithm computes product of two de- 
gree n — 1 polynomials, if all outputs cq, . . . , C2n-2 are computed in some of its 
steps. The number of steps of an algorithm A is called algebraic or arithmetic 
complexity of A. 

In what follows, we will always consider division-free algebraic algorithms. 
A multiplication performed in a step of an algorithm is called scalar, if at least 
one multiplicand is a field constant, and nonscalar in the other case. For an 
algorithm A which computes the product of two degree n — 1 polynomials, we 
define L™{n) to be the number of nonscalar multiplications used in A, and 
L\{n) to be the total number of additions, subtractions and scalar multiplica- 
tions in A. We also set i^(n) := L^{n) + L'^{n), the total algebraic complexity 
of A computing the product of two degree n — 1 polynomials. In what follows. 
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AJ! always stands for the set of division-free algorithms computing the product 

of two degree n ~ I polynomials over fc, 

:= min ^XW^ Ll{n) := mm L'^{n), Lfc(n) := min L^(n). 
AeA'^ AeA^ AeA^ 

When the field k will be clear from the context or insignificant, we will use 
then the simplified notation: L"^{n), L°-{n) and respectively. Note, that 

L{n) needs not to be equal to L"^{n) + L°-{n), since the minimal number of 
nonscalar multiplications and the minimal number of additive operations and 
scalar multiplications can be achieved by different algorithms. 

1.2 Fast Polynomial Multiplication And Lower Bounds 

Design of efficient algorithms and proving lower bounds is a classical problem 
in algebraic complexity theory that received wide attention in the past. For an 

exhaustive treatment of the current state of the art we advise the reader to refer 
to [9, Sections 2.1, 2.2, 2,7, 2.8]. There exists an algorithm A G AJJ, such that 

LJ{n) = 0{n), L2i(n) =0(n log n), L^(n) = 0(n log n), ^ (3) 

if k supports Discrete Fourier Transformation (DFT) of order 2', [9, Chapter 1, 
Section 2.1] or 3', [9, Exercise 2.5] for each / > 0. Schonhage-Strassen's algo- 
rithm B € A^ computes the product of two degree n — 1 polynomials over an 
arbitrary field k of characteristic different from 2 with 

Lg'(n) = O(nlogn), Lg(n) = 0(n log n log log n), 
LB{n) = 0(n log n log log n). 

cf. [24], [9, Section 2.2]. In fact, the original algorithm of [24] computes prod- 
uct of two n-bit integers, but it readily transforms into an algorithm for degree 
n — 1 polynomial multiplication. For fields of characteristic 2, Schonhage's algo- 
rithm [23], [9, Exercise 2.6] has the same upper bounds as in (4). An algorithm 
C for multiplication of polynomials over arbitrary rings with the same upper 
bound for L'c,{n) was first proposed by Kaminski in [17]. However, there was no 
matching upper bound for L^,{n). Cantor and Kaltofen generalized Schonhage- 
Strassen's algorithm into an algorithm C for the problem of multiplication of 
polynomials over arbitrary algebras (not necessarily commutative, not necessar- 
ily associative) achieving the upper bounds (4), see [11]. 

For the rest of the paper, we will use the introduced notation: A will al- 
ways stand for the multiplication algorithm via DFT with complexity upper 
bounds (3), B will stand for Schonhage-Strassen's algorithm if charfc ^ 2 and 
for Schonhage's algorithm if char k = 2, both with complexity upper bounds (4), 
and C will stand for Cantor-Kaltofen's algorithm for multiplication of poly- 
nomials over arbitrary algebras with the same complexity upper bounds as 
Schonhage-Strassen's algorithm. 

^In this paper we always use log := log2. 
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Upper and lower bounds for L™(n), whieh is also called the multiplicative 
complexity, received special attention in literature, see, e.g., [9, Section 14.5]. 
It is interesting, that for each k, there exists always an algorithm £ e 
with L^{n) = 0{n), if we do not worry that L'^{n) will be worse than in (4), 
see [12, 25]. 

If > 2n- 2, then it is known, that L"'{n) = 2n- 1, see [9, Theorem (2.2)]. 
For the fields k with n — 2 < |fc| < 2n — 3, the exact value for L™{n) = 
3n — [^J — 2 was proved by Kaminski and Bshouty in [19, Theorem 2] (see [7, 
Lemma 1] for the proof of the theorem to hold for the multiplicative complexity). 

In order to multiply two degree n — 1 polynomials over it suffices to 
pick an irreducible over ¥q polynomial p{x) of degree 2n — 1 and multiply two 
elements in Vq[x]/p{x), that is in ¥q2n-i. Therefore, for finite fields k = ¥q 
with \k\ = q < n — 3, currently best upper bounds for L^^{n) are derived from 
Chudnovskys' algorithm for multiplication in finite field extensions [12, 25] and 
its improvements by Ballet et al. {p stands always for a prime number; in fact 
all of the following upper bounds hold also for the bilinear complexity, which is 
a special case of multiplicative complexity, when each nonscalar multiplication 
in an algorithm is of kind i{ao, ■ ■ ■ , an-i)-i'{bo, ■ ■ ■ , bn-i) for some linear forms 
e, I' e (fc")*): 

'4(1 + 7f=3)n + o(n), q = p^'^> 25, [12, Theorem 7.7], 
4(1 + ■:^)n, q = p^>^> 16, [1, Theorem 3.1], 



6(1 + ^)n, 5 = p > 5, [3, Theorem 2.3], 



6(1 + ^)n, q=p''> 16, [2, Theorem 4.6], 



12(1 + ^)n, g > 3, [1, Corollary 3.1], 

54n — 27, q = 3, [1, Remark after Corollary 3.1], 

^ - ^ < 36.7n, 9 = 2, [4, Theorem 3.4]. 

The best known lower bounds in case of A; = Fg when q < n — 3 are 



iF,(n)>L?(n)> 



'{3+ /l-'}ly, )n-o{n), q>3, [18], 
^3.52n- o(n), q = 2, [6]. 



If we allow for a moment divisions to be present in an algorithm, then there 

is a lower bound 3n — o(n) for the total number of nonscalar multiplications 
and divisions necessary for any algebraic algorithm computing product of two 
degree n polynomials, see [8]. 

There are few lower bounds for the algebraic complexity of polynomial mul- 
tiplication. Most of them are actually bounding L™(n) which can be used as a 
conservative lower bound for L{n). Since the coefficients cq, . . . , C2n-2 are lin- 
early independent, in case of division- free algorithms one immediately obtains 
the lower bound L(n) > L"^{n) > 2n — 1 over arbitrary fields. To the moment, 
this is the only general lower bound for L{n) which does not depend on the 
ground field. Biirgisser and Lotz in [10] proved the only currently known non- 
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linear lower bound if n{n\ogn) for Lc{n) (actually, on L^{n)) which holds in 
case when all scalar multiplications in an algorithm are with bounded constants. 

The gap between the upper and the lower bounds on Lk{n) motivates to 
look for better multiplication algorithms and for higher lower bounds for the 
complexity of polynomial multiplication, in particular over small fields. For ex- 
ample, it is still an open problem if the total algebraic complexity of polynomial 
multiplication is nonlinear, sec [9, Problem 2.1]. Another well known challenge 
is to decrease the upper bound for Lk{n) of (4) to the level of (3) in case of ar- 
bitrary fields, see [21] for the more general challenge of multivariate polynomial 
multiplication. In this paper we partially address both problems. 

1.3 Our Results 

As our first contribution, for every field k, we present an algorithm Vk G AJJ, 
which is a generalization of Schonhagc-Strassen's construction that works over 
arbitrary fields and achieves the best known complexity upper bounds. In fact, 
we argue that the algorithm Vk stands for a generic polynomial multiplication 
algorithm that relies on consecutive application of DFT. In particular, the algo- 
rithms A, B, and C come as special cases of the algorithm V^. We are ciirrently 
not aware of any algorithms with an upper bound of (4) that are not based on 
consecutive DFT applications and thus do not follow from the algorithm Vk- 

As the second contribution, we show that Lx)^ (") = o{n log n log log n) in 
case when algorithm A cannot be applied but the field k has some simple al- 
gebraic properties that are ignored by algorithms B and C. This improves the 
upper bound of (4) over such fields. We also present a parameterization of fields 
k with respect to the performance of the algorithm 2?^, and give explicit upper 
bounds which depend on this parameterization. More precisely, over each field 
k, wc have Q{nlogn) = L-D^in) = O(nlognloglogn), and over certain fields 
that do not admit low-overhead application of the algorithm A, the algorithm 
Pfc achieves intermediate complexities between the indicated bounds. 

Finally, we show, that the algorithm has natural limitations depending on 
the ground field k. For example, we prove that LD^{n) = f2(nlognloglogn). 
Furthermore, we characterize all such fields, where application of DFT-based 
methods does not lead to any improvement of the upper bound (4). There- 
fore, we consider this as an exhaustive exploration of performance of generic 
algorithms for polynomial multiplication based on application of DFT. 

1.4 Organization Of the Paper 

Section 2 contains the necessary algebraic preliminaries. We then give a uni- 
form treatment of the best known algorithms for polynomial multiplication over 
arbitrary fields in Section 3: Schonhage-Strassen's algorithm [24], Schcinhage's 
algorithm [23] and Cantor-Kaltofen's algorithm [11]. In Section 4 we remind the 
best known upper bounds for computation of DFT over different fields and show 
some efficient applications of their combination. We also indicate limitations of 
the known techniques. 
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Section 5 contains our main contributions. Wc end with one particular 
number-theoretic conjecture due to Blaser on the existence of special finite field 
extensions. In fact, if it holds, then the algorithm algorithm TDk can achieve 
better performance than that of the previously known algorithms B and C over 
any field of characteristic different from 0. 



2 Basic Definitions 

In what follows we will denote the ground field by k. Algebra will always stand 
for a finite dimensional associative algebra over some field with unity 1. For 
a function / : N — > R, a positive integer n is called f -smooth, if each prime 
divisor of n does not exceed f{n). Note, that this definition is not trivial only 
if /(n) < If /(n) = 0(1), then an /-smooth positive integer is called just 
smooth. 

All currently known fastest algorithms for polynomial multiplication over 
arbitrary fields rely on the possibility to apply the Discrete Fourier Transform by 
means of the Fast Fourier Transform algorithm (FFT) and on the estimation of 
the overhead needed to extend the field to make DFTs available. This possibility 
depends on existence of so-called principal roots of unity of large smooth orders, 
e.g., of orders 2^ for all v > Q. 

Let A be an algebra over a field k. w G A is called a principal n-th root of 
unity if = 1a (where 1a is the unity of A) and for 1 < < n, 1 — is not 
a zero divisor in A. It follows, that if ui € A is a. principal n-th root of unity, 
then char k \ n and 

y^^^i-^^l*^' if« = (modn), 
~, I 0, otherwise. 

If A is a field, then w € A is a principal n-th root of unity iff a; is a primitive 
n-th root of unity. For a principal n-th root of unity co £ A, the map 

DFT^ : A[a;]/(a;" - 1) yl" 

defined as DFTj!^ {j22=o "'i^^"^ = («o, • • • , a^-i), where = Yl2=o ^'''"o.i^^ for 
i = 0, . . . , n — 1, is called the Discrete Fourier Transform of order n over A 
with respect to the principal n-th root of unity oj. 

It follows from Chinese Remainder Theorem that if w € A is a principal n-th 
root of unity, then DFTJ^ is an isomorphism between A[x\/{x^ — 1) and A". (5) 
implies that the inverse transform of DFT" is i DFT^ since w "'^ is also a 
principal n-th root of unity in A [9, Theorem (2.6)]: = ^ S"=o ' '^v, 

for i = 0, . . . , n — 1. Note, that if a; e A is a principal n-th root of unity and 
a{x) = aQ + aix H -|- an-ix"'~^ G k[x]/{x"' — 1), then 

DFT^ {a{x)) = {a{LO% a(a;), . . . , a{uj^-')) . 
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An important property of the DFT is that it can be computed efSciently 
under certain conditions, see Section 4. We only mention here, that if n = s" 
for some constant s, there is a principal n-th root of unity oj in an algebra 
A, then DFT" can bo computed in 0(n log n) additions of elements of A and 
multiplications of elements of A with powers of w. 

3 State Of the Art 

3.1 Multiplication via DFT 

The easiest way to illustrate power of applications of DFT is to consider mul- 
tiplication of polynomials over a field k which contains primitive roots of unity 
of large smooth orders. Assume that for some integer constant s > 2 and for 
each p, k contains a primitive s'^-th root of unity. The well-known DFT-based 
algorithm A takes two degree n — 1 polynomials a{x) and b{x) and proceeds as 
follows: 

Embed and pad Set = [logg(2n — 1)] such that s'^ > 2n — 1. Pad the vec- 
tors of coefficients of a(x) and h{x) with zeroes and consider a{x) and b(x) 
as polynomials of degree s'^ — 1 in k[x]/{x^ ~ !)■ This step is performed 
at no arithmetical cost. 

Compute DFTs For a primitive s'^-th root of unity oj G k, compute 

o:=DFT^.(a(a;)), 6 := DFT^„(6(a;)). 

The cost of this step is O(nlogn) arithmetical operations over k (recall, 
that s is a constant). 

Multiply vectors Compute dot-product c:= a-b, that is perform = 0{n) 
multiplications of elements in k. 

Compute inverse DFT Compute 

1dFT^;\5)=c(x). 

This step requires 0(n log n) arithmetical operations in k. 

As we can see the total complexity of O(nlogn) arithmetic operations over 
k. Note, that the number of multiplications is < 2ns — s, and is linear in n 
as long as s is a constant. 

3.2 Multiplication in Arbitrciry Fields 

Now suppose that k does not contain the needed primitive roots of unity. The 
methods we will describe now are all based on the idea of an algebraic exten- 
sion K D k where the DFT of a large smooth order s'^ is defined. In these 
methods one encodes the input polynomials into polynomials of smaller degree 
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over K and uses the algorithm A over K to multiply these polynomials. The 
s'^ multiplications of elements in K are performed via an efficient reduction to 
multiplication of polynomials of smaller degree, thus making the whole scheme 
recursive. 

3.2.1 Schonhage-Streissen's Algorithm 

Assume that charfc ^2. In this case, x is a 2n-th principal root of unity in 
An := fc[a;]/(x" + 1), which is a fc-algebra of dimension ndimA [9, (2.11)] and 
A[a;]/(a;" + 1) ^ A[a;]/(a:"' — 1), if a fc-algebra A contains a principal 2n-th 
root of unity [9, (2.12)]. For n > 3, Schonhage-Strassen's algorithm [24], which 
we denote by B takes two degree n — 1 polynomials a{x) and h{x) over k and 
proceeds as follows: 

Embed and pad Set v = [log2(2n - 1)] > 2 such that N ■.^2" > 2n- 1. Pad 
the vectors of coefficients of a{x) and h{x) with zeroes and consider a{x) 
and b{x) as polynomials of degree N — 1 m An ■ This step is performed 
at no arithmetical cost. 

Extend Set A^i := 2^^! > 2, N2 := 2LtJ+i, ^^^^ ^-^^^ i:L . jy^ = Encode 
a{x) and b{x) (considered as elements of An) as polynomials of degree 
N2 - 1 over An, = fc[t/]/(y^i + 1): 

N-l N2-I /-^-l \ ^ . N2-I 

^ — V ' 

2/ is a 2A/'i-th principal root of unity in An-, and 2N\ > N2, all powers of 
2. Since N2 \ 2Ni, ip := y''^ is a principal A^2-th root of unity in An, ■ 

Compute DFTs of orders A^2 of a{x) and b{x) with respect to t/j. Note, that 
addition of two elements in An, can be performed in Ni additions in A, 
and multiplication by powers of tp, that is, by powers of y results in cyclic 
shifts and sign changes and is also bounded by A'^i additions (if we count 
a sign change as an additive operation). Therefore, this step requires 
0{Ni ■ 7V2logA^2) = 0{N log N) arithmetic operations over k. 

Multiply the coordinates of a • 6 = c. This results in computing N2 prod- 
ucts of polynomials of degree — 1, which are computed by a recursive 
application of the currently described procedure. 

Compute inverse DFT of c with respect to ip~^ = y ^ . As before, this 
requires 0{N log N) additive operations in k. 

Unembedding in this case is can be computed in the following way: since 
degrees in y of all coefficients a^, bi were at most ^ — 1, and they were 
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multiplied in A^-^, degrees in y of all coefficients Cj are at most Ni—2 < Ni. 
Therefore, for alH = 0, . . . , A^2 — 1, 

Afi-l 

are already computed with some c,, j e k, and 

N-1 

can be computed by at most TV additions of elements in k (we assume that 
Cij = if i < or j > Ni). 

Denoting by L'jg{N) the total complexity of multiplication in Aj^ via Schonhage- 
Strassen's algorithm B, we obtain following complexity inequality: 

Lein) < L'siN) < NzL'^ {Ni) + 0{NlogN). 

It implies L'jg{N) = 0{N log N log log N) and the desired estimates (4) since 
N < An — 2. A more careful examination of the numbers of additions and 
multiplications used gives also the upper bounds (4). 

Rough complexity analysis can be also made by following observations. The 
cost of each recursive step (under a recursive step we understand all the work 
done on a fixed recursive depth) is 0{Ni ■ N2 log N2) — 0{n log n) and is defined 
by the complexity of the DFT used to reduce the multiplication to several 
multiplications of smaller formats. Note, that in order to adjoin a 2iVi-th root 
of unity to k in the initial step we take a (ring) extension of degree A''i , which is 
a half of the degree of the; root we get. This crucial fact reduces the number of 
recursive steps to O (log log n). Thus, the upper bounds (4) for the complexity 
of B can also be obtained as a product of the upper bound for the complexity 
of a recursive step by the number of recursive steps. 

3.2.2 Schonhage's Algorithm 

Now assume that char/c = 2. Again, the first step is the choice of a finite 
dimensional algebra to reduce the original polynomial multiplication to. In case 
of char k = 2, the choice of k[x]/ (.t" + 1) docs not work since it can be used only 
efficient to append 2^-ih. roots of unity and x^ — 1 = {x — 1)^ in every field 
of characteristic 2. Schonhage's algorithm [23] thus reduces the multiplication 
of polynomials over k to the multiplication m Bm '■= k[x\/{x^^ + x^ + 1), 
where a: is a SA'^-th principal root of unity. Therefore, we can follow the way 
of the original Schonhage-Strassen's algorithm with one important modification 
explained in this section. 
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For n > 3, Schonhage's algorithm B takes two degree n — 1 polynomials a{x) 
and b{x) and proceeds as follows: 

Embed and pad Set v = [log3(n - i)] such that for N := 3"" , 2N > 2n - 1. 
Pad the vectors of coefficients of a{x) and b{x) with zeroes and consider 
a{x) and b{x) as elements of Bn- This step is performed at no arithmetical 
cost. 

Extend Set Ni := sT^l and N2 := sLtJ such that N1N2 = N. Encode the 
input polynomials a{x) and b{x) (considered as elements of Bjv) as poly- 
nomials of degree 2N2 — 1 over Bjvi = k[y]/{y^^^ + V^^ + 

2Af-l 27V2-I / Ni-1 \ 2N2-I 

^ — V ' 

y is a 37Vi-th principal root of unity in B^i, and TVi > A^2, both powers 
of 3. Thus, = y'^ is a 3A^2-th principal root of unity in Bn^. 

Compute DFTs of a{x) and b(x), both padded to degree 3A^2 with zeroes, 
with respect to tp- Note, that addition of two elements in BjVi can be 
performed in at most 2Ni additions of elements in k, and multiplications 
by powers of tp, that is, by powers of y can also be performed in 0{Ni) 
operations since y3Afii+^ _ yi^ y3Nii+2Ni+i _ _yNi+e _ yi Jqj. gyery 

i > 0, < e < 2Ni, and < f < Ni. Therefore, multiplication of 
any clement of B^^ by a power of y can be performed by at most one 
addition of two polynomials in i?jvi and sign inversion of it, that is, in at 
most 4:Ni additive operations in k (again, if we count a sign inversion as 
an operation with unit cost, otherwise it is just 2A^i). Overall, this step 
requires 0{Ni ■ N2 log Af2) = 0(A^log7V) operations in k. 

Multiply component-wise two vectors of length 3A^2, a and b. Note, however, 
that only 2N2 out of these products are enough, namely only cii ■ bi where 
i^O (mod 3). This is explained in the next step. 

Compute inverse DFT of (cq, csjva-i) in 0{N log N) operations in k. 
This computes the coefficients of c'{x) = a{x)b{x) (mod x^'^^ — 1), and 
we need 

c{x) = a{x)b{x) (mod x'^^^ + x^^ + 1). 
This is resolved by noticing that 

Ci = — C^+2jV2 5 C,+JV2 = C^+iVg ~ ^+2JV2 ' 

for all i = 0, . . . , A^2 — 1- To compute these differences, consider the 
explicit formulas of the direct DFT of order 3A^2 with respect to ip: 

3W2-1 N2-1 

v=0 v=0 
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-, ^2-1 

i/=0 

for < i < iV2, < i < 2 and c^,^. = |(c'i+V^'^^c^+;v2+V''^''^^^i+2iV2)-V''^- 
Therefore, 

and the required differences 

can be computed from c[ ^ for j = 1, 2, which can be computed via (6) 

from Cij = czi+j — azi+jbzi+j for z = 0, . . . , N2 — 1 and j = 1, 2, that is 
from 2N2 products. 

Unembed in the similar way as in the original Schonhage-Strassen's algorithm. 
This requires 0{N) operations in k. 

If we denote again L'j^{N) the total complcixity of niultipUcation in Bj^ via 
Schonhage's algorithm B, we obtain following complexity inequality: 

Lein) < L'^{N) < 2N2L'^ (A^i) + 0{N log N). 

It implies Lg{N) = 0{N log NloglogN) and the desired estimates (4) since 
N < 3n — 2. Again, a more careful examination of the numbers of additions 
and multiplications used again gives also the upper bounds (4). 

3.2.3 Cantor-Kaltofen's Generalization 

In [11] Cantor and Kaltofen presented a generalized version of Schonhage-Stras- 
sen's algorithm [24] , an algorithm C which computes the coefficients of a product 
of two polynomials over an arbitrary, not necessarily commutative, not neces- 
sarily associative algebra with unity with upper bounds (4). Here we present 
a simplified version of this algorithm which works over fields, or, more gener- 
ally, over division algebras. We will use this restriction to perform divisions by 
constants of an algebra via multiplication by inverses of these constants. 

Let w S C be a primitive n-th root of unity. Then $„(a;) = Y[(i n)=ii^ " '^*) 
is called a cyclotomic polynomial of order n. One easily deduces that for each 
n, 

$„(x) |(a;"-l)= II (x-u^). 

0<i<n 

It is well known, that all coefficients of ^n{x) are integers, for every n, $„(a;) is 
irreducible over Q, and or any s, n, ^s"{x) = $s(x"~^). The degree of ^n{x) is 
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the number of natural numbers i < n, coprime with n, which is denoted by (t>{n) 
and called Euler's totient function. Trivially, (j){n) < n — 1 with an equality iff 
n is a prime, and for n > 3, (f){n) > i • j^^, see [20]. Prom the above properties 
of ^n{x) we also have = s"~^(j>{s) for all s, n > 1. Therefore, if s is a 

constant and n grows, then the number of monomials in ^s'^{x) is bounded by 
a constant (for example, s — 1). 

Let fc be a field of characteristic p, and s > 2 be some integer, that will 
be fixed throughout of the entire algorithm, p\ s. Cantor-Kaltofen's algorithm 
takes two degree n — 1 polynomials a{x) and b{x) over fc forn > and proceeds 
as follows: 

Embed and pad Set u := [logg((4n — 2) logs)] , such that 

N := (Pis") >2n-l. 

The multiplication is then performed in Cjv := k[x]/^M{x), where a; is a 
principal N-th root of unity. 

Extend Set A^i := s^-^icpis), N2 := sTtI-^ such that for 

7V3:=sL*J+\ Ni = (piNs) > N2, N1N2 = N. 

Note, that SN2 \ N3. Encode polynomials a{x) and b{x) (considered as 
elements of Cjv) as polynomials of degree N2 — 1 over CjVg: 

N-l N2-I / Ni-1 \ 

i=0 i=0 \ j=0 J 
^ — V ' 

y is a principal N^-ih root of unity in Cf^^ , therefore, -0 = y "2 is a principal 
iV2-th root of unity and ^ = y''«2 is a principal sA'2-th root of unity in 

Note, that x^^ 1— > y, and the polynomials ai = ai{y) are in fact of degree 
at most [ "'^ 2~ 1 • This follows from the fact, that a; = for Z > n, that is, 
for i + N2j > n, for < i < N2 and < j < iVi. One can easily verify, 
that it is equivalent to the inequality j < ^ — 1 < — 1 < ["l^J ~ 1- 
Therefore, multiplication of any two polynomials taken from the linear 
span of Gi modulo $ Ns {y) is in fact the ordinary multiplication of these 
polynomials. 

Compute DFTs S = DFT*^(o(x)), = T>FT%^{a{^x)) , t = BFT%^{b{x)), 

and b' = 'DFT^^{b{^x)) . Precomputation of coefficient of a{^x) and b{^x) 
requires 0(A^2) multiplications by small powers of y in Cat., . Computation 
of the DFTs requires 0{N2 log additions and multiplications by powers 
of V', that is, by powers of y, in C^Vg. Note, that, as usual, addition of two 
elements in CjVg requires N2 = ^{N^) additions of elements in k. 
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Multiplications by powers of ip, that is, by powers of y, can be first per- 
formed modulo X^^ — 1 at no cost (since they are in this case simply 
cyclic shifts), and then by reduction modulo ^n^{x). This is possible 
since $Arj,(x) divides x^^ — 1. Since ^^six) is monic and has at most 
s nonzero monomials, such a reduction can be performed with at most 
(s — 1)(A^3 — A^2) = 0{N3) scalar multiplications and the same number 
of additions of elements in k. Therefore, the total cost of this step is 
0{N3 ■ Ni log Ni) = 0{N log N) since N3 < ^iVz < 2 log s • A^2 = 0(iV2) 
and N-lN2 = N. 

Multiply component- wise two pairs of vectors of length N2: c" = a -b and 
c' = a' -b'. This is performed recursively by the same procedure since the 
components of these vectors are elements in CjVs- 

Compute inverse DFTs c' = DFT'j^l' (c') and c" = T)FT%~\c"). This re- 
quires again 0(iVlog A^) steps, as in the computation of the direct DFTs. 

Now recall, that we need the coefficients Ci G Cns of the product of 
polynomials c{x) = a{x)b{x) (mod ^n{x)). For this, we shall first com- 
pute the coefficients cq, . . . , C2N2-2 of the regular polynomial product 
c{x) = a{x)b{x). These can easily be computed from the c^, c" via the 
following formulas for < i < A^2: 

In order to get rid of divisions in C.V;, we can use the identity 

^ 2<i<s, 
(i,s) = l 

where r = 1 if s is not a prime power, and r = p if .s = p'^ for some 
prime p. Note, that in the latter case necessarily char k ^ p. This identity 
shows how one can compute the fraction in 2^(s) — 1 additions and 
multiplications by powers of y in Cn^ without divisions: miiltiplication of 
the intermediate product 11 by the next factor 1 — ^-'^^^ can be computed 
as n - Therefore, all coefficients Cj for < i < 2N2 - 2 can be 

computed in 0(N) operations in k. In order to obtain the coefficients of 
c{x), it suffices to reduce the polynomial c{x) modulo ^n^{x) which can 
be performed in 0{N) steps, as explained before. 

Unembedding in this case is not needed because of the choice of the encod- 
ing of polynomials: coefficients Ci computed in the Multiplication step, 
decoded back by substituting y 1— x^^ , turn into polynomials in x with 
monomials of pairwise different degrees for different i = 0, . . . , N — 1. 

If we denote (N) the total complexity of multiplication in Cjv via Cantor- 
Kaltofen's algorithm C, we obtain following complexity inequality: 

Lc{n) < L'c{N) < 2N2L'c (iVi) + 0{NlogN). 
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The choice of parameters Ni and N2 imphes L'(,{N) = 0{N log N\og\ogN) and 
the desired estimates (4) since < (s— 1) log s- (2n— 1) = 0{n). A more careful 
examination of the numbers of additions and multiplications used again gives 
also the upper bounds (4). 

If char k ^2 and s = 2, then (x) = cc^" ^ + 1, and we get the multiplica- 
tion in the algebra A^v-i from the Schonhage-Strassen's algorithm. If char A; ^ 3 
and s = 3, then ^^.^ (x) = x^'^ +.r'' + 1 and we get the multiplication in the 
algebra .63^-1. However, the multiplication is performed differently: instead of 
performing one DFT of order TVs ~ 2-JN over Af^^ (of order 3A^2 over Bjvi with 
only 2A^2 ^ 2\/]V multiplications sufficient, resp.), Cantor-Kaltofen's algorithm 
performs two DFTs of order Ni ~ ^Tn over Cjva • 

Summarizing the above algorithms of complexity 0(n log n log log n) we no- 
tice that in case, when it is impossible to apply FFT directly in the ground 
field, a ring extension is always introduced. Since the costs of all recursive 
steps are roughly the same, total complexity of such an algorithm can be nat- 
urally bounded by the product of the cost of one recursive step by the number 
of steps, which is O (log log n) in the algorithm B. Complexity of one recur- 
sive step is defined by the complexity of computing DFTs, for which nothing 
better than 0(n log n)-time algorithms for computing of a DFT of order n is 
currently known. The first potential improvement of this scheme is to reduce 
the complexity of algorithms computing DFT. The second is reducing the num- 
ber of recursive steps of such an algorithm. In the first case we can increase 
the number of recursive steps needed, depending on the boost we will achieve 
in computing DFT. In the second case we can increase the number of opera- 
tions used by DFT computations, however, we must always make sure that the 
product of these two values does not exceed f2(nlognloglogn). In this paper 
we are concerned mostly with the problem of reduction of the recursive depth of 
such algorithms. Effectivity of our solution appears to depend only on algebraic 
properties of the ground field. 



4 An Upper Bound for the Complexity of DFT 

In this section we summarize the best known upper bounds for the computation 
of DFTs over an algebra A with unity 1. Let w e A be a principal n-th root 
of unity For a(.x) € A\x\ of degree n - 1 let a DFT5;'(a(.T)) € A". We wiU 
denote the total number of operations over A that are sufficient for an algebraic 
algorithm to compute the DFT of order n over A by £'^(n). In case, when the 
algebra A be insignificant or clear from the context, we will use the notation 
Din). 

There is always an obvious way to compute a from the coefficients of a{x). 

Lemma 1. For every A and n > 1, such that the DFT of order n is defined 
over A, 

^4n)<H-'"+'' (7) 

^ ^ - (2n2-5n + 4, j/2 I n. ^ ^ 
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Proof. To compute fio, n — 1 additions are always sufficient. Let cj G A be a 
principal n-th root of unity. If 2 | n, then to 2 = —1, and to compute a^, n — 1 
additions/subtractions are also sufficient. For the rest of the coefficients Oj, one 
always needs n— 1 additions and, in case of odd n, n—1 multiplications by powers 
of w. For even n, one multiplication can be saved, namely, by w*^ = (—1)*, it 
can be implemented by selective changing the sign of the corresponding additive 
operation in the sum for Oj. Therefore, we obtain 



DA{n) < 



(n-l) + 2(n-l)2 = 2n2-3n + l, if2tn, 
2(n - 1) + (n - 2)((n - 2) + (n - 1)) = 2n? - 5n + 4, if 2 | n, 



which proves the statement. □ 

The next method of effective reduction of a DFT of large order to DFTs 

of smaller orders is known as Coolcy-Tukcy's algorithm [14], [13, Section 4.1] 
and is based on the following lemma which directly follows from the well-known 
facts and is present here for completeness. 

Lemma 2. Let the DFT of order 

n = pt\..pt^>2 (8) 
be defined over A {p^ are not necessary prime and even pairwise coprime). Then 



Din)<ny^{^iD{p^)-l) + d^]-n + l. (9) 



CT=1 



Proof. We first prove that if n = nin2, then 

D{n) < niD{n2) + n2D{ni) + (m - l)(n2 - 1). (10) 

Let u) G Ahe a principal n-th root of unity. Then ui := w"^ is a principal ni-th 
root of unity and LU2 ■— io"^ is a principal n2-th root of unity. For a polynomial 
a{x) e Alxi/ix" - 1), consider a = DFT;^(a(a;)): for < j < 712, < Z < m 

n—1 m— ln2 — 1 

i/=0 1^=0 Ai=0 

n2 — 1 / rii — 1 s n2 — 1 ri2 — 1 

= E E <^n..+^<)uj^' = E = E '^/'.'^ 



Computation of all values (ijj for a fixed I can be performed via the DFT of 
order n2 with respect to cj2. Therefore, to compute all values aj, i.e., all values 
aj for < i < n, it suffices to perform ni DFTs of order n2- Computation of all 
values ttjii, ; for fixed fj, can be performed via the DFT of order ni with respect 
to coi. Therefore, to compute all values a^j, it suffices to perform n2 DFTs of 
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order ni. Finally, to compute a^j from a^,;, one needs one multiplieation by 
w'^' if /U > and ? > (if /x = or Z = then no computation is needed). This 
takes (ni — l)(n2 — 1) multiplications by powers of oj to compute all values a^, 
This proves (10). 

(9) follows by consecutive application of (10) choosing di times pi for m, 
then d2 times p2, etc. Noting that -D(l) = completes the proof. □ 

Corollary 1. Let n be as in (8), and let all 2 = pi < p2 < ■■ ■ < Ps he all 
primes. Then 

D{n)<{^di+2Y,d,{p„-l)-l^n+l. (11) 

In particular, 

D{n) < 2 max p„ ■ nlogn. (12) 

l<o-<s 

Proof. (11) follows from (9) by applying the upper bound of Lemma 1 for the 

values of D{p„). 

Obviously d\, . . . , dg < log n since p^" < n, Po- > 2 for 1 < cr < s. Therefore, 
D{n) < ( ^ + 2( max p„ — 1) — 1 ] nlogn + 1 < 2 max p„ ■ nlogn, 

\2 !<<'■<« / 1<<''<S 

which proves (12). □ 

Lemma 2 provides an efficient method of reduction of a DFT of compos- 
ite order n to several DFTs of smaller orders which divide n. For example, 
if all Per in (8) are bounded by some constant, then (12) shows that Cooley- 
Tukey's algorithm computes the DFT of order n in O(nlogn) steps. Fur- 
thermore, if m&-x.i<cr<sP(j < g{n) for some slowly growing function g{n), say 
g{n) = o(loglogn), then (12) gives an upper bound of o(n log n • g{n)) for the 
computation of the DFT of order n. However, this method fails to be effective 
if n has large prime factors (or is just prime). We could use the algorithm from 
Lemma 1, but sometimes we can apply Rader's algorithm to compute a DFT 
of prime order [22], [13, Section 4.2]. 

Lemma 3. Let p be a prime, and assume that the DFT of order p is defined 
over A. 

1. If the DFT of order p-1 is defined over A, then D{p) < 2D{p- l) + 0{p). 

2. If for n > 2p — A, the DFT of order n is defined over A, then 

Dip) < 2Din) + Oin). 

Remark 1. Note, that the first bound can be efficient iip—1 is a smooth number. 
Otherwise we may choose some larger smooth n for the second case, making sure 
that the DFT of order n exists over A and n is not too large, that is, to achieve 
an O(plogp) upper bound for D{p). 
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Proof. Let w e ^4 be a principal p-th root of unity. For a polynomial 

a{x) e A[x\/{xP - 1), 

the value of clq = Yl^=o ^» computed directly by performing p — 1 addi- 

tions. For 1 < i < p — 1, 

p-i 

cLi — ao = ajW*-' =: d'^. (13) 

Thus, to compute all a, from d'^, p — 1 additions are enough. 

1. The multiphcative group F* = {1 < * <p} is isomorphic to the cychc 
group Zp_i with p — 1 elements. We will denote the isomorphism by a. 
For a"_i := aa{i) and d"_i = d'^^-y from (13) we obtain 

p-l p-2 

The latter is a cyclic convolution, which can be performed via computing 
the coefBcients of the product of the degree p — 2 polynomial 

p-2 

and the degree p — 2 polynomial with fixed coefficients 

p-2 
1=0 

This can be achieved by computing the DFT of a"{x), performing p — 1 
multiplications by constants (components of the DFT of io{x), in fact, 
these are just polynomials in u>), and computing the reverse DFT. This 

proves the first bound. 

2. For an n > 2p — 3, we may define the polynomials 

n-1 

a{x) = 4 + a'/a;"-f+2 + • • • + a';_2x''-\ u>{x) = ^ '""'^ (P-2)+i)a.i 

i=0 

and compute their cyclic convolution. Then the first p — 1 coefficients of 
the cyclic convolution will be exactly the u'q, . . . , ap_2- Note, that again, 
we do not need to count the complexity of the DFT of u){x) since it is 
fixed and can be precomputed. This proves the second bound. □ 

Corollary 2. Let p be a fixed odd prime, k be a field where the DFT of order 
p^-lis defined for N = 2", n > [log(2p- 5)]. Then Dk{p^ -I) = 0{p^-N^). 
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Proof. We have -1 = {p- l){p + l){p'^ + 1) ■ ■ ■ (p^" ' + 1). Since p is odd, 

each factor is even and - 1 = 2" • 111=1^ ^Q^- Let - 1 = • • • pf» 
be the decomposition of — 1 into primes and pi = 2 < p2 < ■ ■ ■ < Ps, 
and p2, . .., Pii are all less than Pi^+i, . .., Pi^ are less than ^ji, and, in 

23-1 , 

general, Pi^+i, . . . , Pi^+i are less or equal than 2 — 2~^- Note, that i„ = s. We 
also set i-i =0, io = 1. From (9) we have 

D{p^ - 1) < {p^ - 1) ^ f-(I)(p.) - 1) + d^ - + 2. 

Obviously, for pi = 2, we have D{pi) = 2 < pi ■ logpi. Using Lemma 3 we can 
compute the DFT of orders p„ ioi p„ = 2, . . . , i2 in 8pcr logpo- + 0(pcr) time 
since we can reduce each DFT of order pa- to 2 DFTs of order 2"^ > 2pcr — 4, 
2"! < 4p„. This is possible since the DFT of order 2" > 2 • - 4 is defined 
over k. In the same way, the DFT of order for a = ii + 1, . . . , i2 can be 
computed in Wpa- ^ogpa- + 0{pcr) steps since 2" • 2zl > 2 • 2±i — 4. Continuing 
this process we obtain the following upper bound: 

n-l ij+l 

D{p^ -l)<{p^ ^ 0{d,-2nogp, + d,) 

j=—l a=ij-\-l 

= 0(p^ • AT • log n p^') = 0(p^ • N% 

o-=l 

which completes the proof. □ 

Remark 2 . For a fixed odd prime p, the DFT of order p^" — 1 is defined in the field 
Fp2n since the multiplicative group F*2n of order p^ — 1 is cyclic. Corollary 2 
implies that the DFT of order p^ — 1 can be computed in 0{p^ ■ 2^") steps over 
Fp. A similar argument shows that the same holds for any field of characteristic 
p which contains F 2*; as a subfield. 



5 Unified Approach for Fast Polynomial Multi- 
plication 

In this section we present our main contribution. We proceed as follows: first 
we introduce the notions of the degree function and of the order sequence of 
a field. Then we describe the DFT-bascd algorithm Vk which computes the 
product of two polynomials over a field k. We show that 'Dk generalizes any 
algorithm for polynomial multiplication that relies on consecutive applications 
of DFT, and in particular, Schonhage-Strassen's [24], Schonhagc's [23], and 
Cantor-Kaltofcn's [11] algorithms for polynomial multiplication arc special cases 
of the algorithm Dk ■ We prove that both the upper and the lower bounds for 
the total complexity of the algorithm Vk depend on the degree function of k 
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and the existence of special order sequences for k. In particular, we show that 
Lj)^{n) = f2(nlogn) when fc is a finite field, and Lx>^{n) = ri(nlognloglogn). 
Furthermore, we show sufficient conditions on the field k for the algorithm Dfe 
to compute the product of two degree n polynomials in o(n log n log log n), that 
is, to outperform Schonhage-Strassen's, Schonhage's and Cantor-Kaltofen's al- 
gorithms. Finally, we pose a number-theoretic conjecture whose validity would 
imply faster polynomial multiplication over arbitrary fields of positive charac- 
teristic. 

In what follows k always stands a field. 

5.1 Extension Degree and Order Sequence 

Deflnition 1. The degree function of k, is fk{n) = : k] for any positive 

n, where cj„ is a primitive n-th root of unity in the algebraic closure of k. 

For example, /fe(n) = 1 if A: is algebraically closed, /R(n) = 1 if n < 2 and 
fKin) = 2 for n > 3, /Q(n) = ^{n) where ^{N) is as before the Euler's totient 
function. 

An important idea behind Fiirer's algorithm [16, 15] is a field extension of 
small degree containing a principal root of unity of high smooth order. In case 
of integer multiplication, the characteristic of the groimd ring is a parameter we 
can choose [15], and it allows us to pick such Zpc that p''' — I has a large smooth 
factor. However, in case of multiplication of polynomials over fields, we cannot 
change the characteristic of the ground field. In what follows we explore this 
limitation. 

Definition 2. An integer n > is called c-suitable over the field k, if the DFT 
of order n is defined over k and Dk{n) < cnlogn. 

It follows from Corollary 1 that any c-smooth n is c-suitable over k as long 
as the DFT of order n is defined over k, and Lemma 3 also implies, that if for 
each prime divisor p of n, or p — 1 or some n' > 2p — 3, n' = 0{p) is c-suitable 
over fc, then n is 0(c)-suitable. If charfc > 3, then the integers (charA;)^ — 1 
are 2"-suitable over k for arbitrary n (see Remark 2). 

Definition 3. Let s{n) : N — ^ M be such that s{n) > 1. A sequence 

N = {m, 712, ...} 
is called an order sequence of sparseness s{n) for the field k, if 

ni < rii+i < s{ni)ni 

and rii \ Uj+i for i > 1, and rij = n^n", such that there exists a ring extension 
of k of degree n[ containing an n"-th principal root of unity w„'.', which is 0(1)- 
suitable over this extension. If s(n) < C for some constant C, then M is called 
an order sequence of constant sparseness. 
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It follows from Remark 2 that rii = 2^ ■ {p^ — 1) is almost an order sequence 
of sparseness s(n) = n for any field of characteristic p. Decreasing the upper 
bound for the computation of DFT from O(nlog^n) to O(nlogn) would turn 
it into an order sequence. 

Remark 3. If char A: 7^ 2, then for the order sequence A/" — {2*}i>i, fk{n") < ^ 

for each n = n'n" G Af since if for n £ M, n" = 2\^~\ , n" = 2\-^\~^^ is 
a primitive n"-th root of unity in the algebraic closure of k, then 

k{u)n") = k[x]/p{x) 
and p{x) I xT" + 1. The same argument shows that if char fc 7^ 3 and 

then /fe(n") < ^ for each n = n'n" &M,n' = 2 • sT^l , n" = sL^J+S since 

2n" n" 

for k{tjJn") — k[x]/p{x), p{x) \ x~s- -\-x~ + 1. Both these order sequences have 
constant sparsenesses. 

Definition 4. A field k is called 

• Fast, if there is an order sequence M of constant sparseness such that 

fk{n[) = 0(1) for all n, - « e 

• t{n)-Fast, if there exists an order sequence J\f of constant sparseness such 
that fk{n'i) < t{n'i) for all n=« e A^. 

• t{n)-Slow, if for any order sequence J\f of constant sparseness, 

fk{<) > t{nf^ 

for all ni = n'^n'l e AA. 

For example, any algebraically closed field is fast, K is a fast field, and Q 
is a 0(n)-slow field, in particular, Q is an ^^^^^ -slow field. It follows from 
Remark 3, that any field of characteristic different from 2 is |^-fast, and any 
field of characteristic different from 3 is ^-fast. 

If we want to extend a 6(n)-slow field k with an n-th root of unity, the degree 
of the extension will be 0(6(n)). We will see, that to increase performance of a 
DFT-based algorithm for computing the product of two degree n— 1 polynomials 
over k, we need to take an extension K k oi degree rii over k, such that 
K contains a primitive n2-th root of unity. We will want 712 to be a large 
suitable number and to belong to a "not too sparse" order sequence, preferably 
of constant sparseness, m to be small such that 2n — 1 < nin2 = 0{n). 

We close this subsection with introducing some technical notation, for a 
function / : N — > N, such that limsup^^^ f{n) = 00, we will denote by f^{n) 
the minimal value f{i) over all integer solutions i of the inequality 

i ■ f{i) > n. 

For example, = (j^y ~ for n > 2,^ and for q > 2, 

^By /(n) ~ g{n) we denote /(n) = (1 ± o(l))ff(n). 
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(logg nf = logg n - e(logg logg n) if n > q. 

We will need to restrict the possible values for i in the inequality to be taken 
from some order sequence. 

For a monotonically growing function / : N ^ N, such that hm„^oo — < 1, 
we will define f^°\n) = n, and for i>l, f^^^{n) = f^^~^\f{n)). For each n > 1, 
there exists the value i = i{n) such that 

/(-I) (n) ^ = (n) = ---. 

This value will be denoted by f*{n). For example, 

- j = [logn] , {\Vn\) = [log log n] , ([logn])* = [log* n] . 
5.2 Generalized Algorithm For Polynomial Multiplication 

The DFT-bascd algorithm A, the Schonhagc-Strasscn's and Schonhagc's algo- 
rithms B, and the Cantor-Kaltofen's algorithm C are all based on the idea of a 
field extension with roots of unity of large smooth orders to reduce the poly- 
nomial multiplication to many polynomial multiplications of smaller degrees by 
means of DFT. The natural metaflow of all these algorithms can be generalized 
as follows: let J\f be an order sequence of constant sparseness over a field k, for 
two polynomials a{x) and b{x) of degree n — 1 over k: 

Embed Choose a polynomial Pjv {x) of degree N = N'N" e J\f, 

2n-l<N = Oin), 

and switch to multiplication in Ajsi :— k[x]/ P]\[{x). From this moment con- 
sider a{x) and b{x) as elements of Ajy. There should be an efficiently com- 



putable by means of DFTs injcctive homomorphism ^ : Aj^ ^ {A 



N' 



\2N" 



where Aj^> = k[y]/ P^'iy) for some Pn>(jj) G k[y], and Aj^> contains a 
principal Ar"-th (or 2Ar"-th) root of unity. 

Transform By means of DFTs over A^i compute 

a:=V(a(a;)), b := ipibix)), 

both in {An')^^". 
Multiply Compute 2N" products c:= a-b in Anl 

Back- Transform By means of DFT compute c{x) = ip~^{c), which is the 
ordinary product of the input polynomials. 

Unembed Reduce the product modulo Piv(x) to return the product in Ajy. 

Theorem 1. The algorithm A, Schdnhage-Stras sen's and Schonhagc's algo- 
rithms B and Cantor-Kaltofen's algorithm C are instances of the algorithm V. 



21 



Proof. For a field k which contains an 7V-th primitive root of unity for 

^ 2ri°s(2n-l)l^ 

N = 0(n), set Pn{x) = - 1, TV' = 1, A^" = and An' = k. Then V is the 
DFT of order 27V (which can be trivially reduced to N in this case) over k and 
the algorithm V appears to be the algorithm A. 

For a field k of characteristic different from 2, for i/ = [log(2n — 1)] and 

= 2", set Pn{x) = + 1, N' = 2\^\ and A^" = 2LtJ. Then V is the 
DFT of order 2A''" over A^^/ and the algorithm V appears to be the Schonhage- 

Strassen's algorithm B [24]. 

For char/c = 2, set u = [log3(n - i)] , TV = 3", and P2Nix) = x^^ + x^ + l, 

N' = sT^l, and A^" = sUJ. Then V is the DFT of order 3N" over An'. 
However, to fetch the entries of the product in An' by means tp~^, 2N" products 
of polynomials in A^i are sufficient [23]. Therefore, the algorithm D appears 
to be the Schonhage's algorithm B. 

For an arbitrary field A; fix a positive integer s ^ charp and find the least p 
such that A^ = (j){s'') = s'^-V(s) > 2n-l, and let N = s". Set Pj^{x) = $jy(.t), 

N' = (/)(sL^J+i), and A^" = s^^^K Then tp = ao/3 where a stands for 2 DFTs 
of order N" over A', and /3 is a linear map An' [x] — > Ajv [x] x An' [x] such that 
/3(a(a;)) = (a(x), a(7x)), where 7 is the sN"-th root of imity in An', i.e., for 
Ajv' = A;[y]/i>jY,(t/), either 7 = y or 7 = y^. One can easily show that /3 and 
(3~^ are computable in linear time. Therefore, the algorithm V appears to be 
the Cantor-Kaltofen's algorithm C. □ 



5.3 Complexity Analysis 

From the description of the algorithm V we have 

Lv{n) = L'j,{N) = 2N"L'j,{N') + 2T{i^{N)) +T{i^-^{N)) 

where L'jy{N) denotes the complexity of V computing the product in Apf, 
T{ip{N)) and T{ip~^{N)) stand for the total complexities of the transforma- 
tions if) and -0"^ on inputs of length N respectively. 

Theorem 2. Let the algorithm T) compute the product of two polynomials in 
An in £ recursive steps and let N' = N'^^ and N" = N'l he chosen on the step 
X = l, ...,£ {Nl^ = N, N'^ = 0(1)), and for M(Arj^) = max{l, ^'^5^ }, where 

M*{N')J stands for the complexity of multiplication of an element in A^i^ by 
powers of an N'^-th root of unity (which exists in A^'^ by assumption). Then 

L't^{N) = qIn -2^ + N^2^-'^ ■ M{N'y) log N'l\, (14) 

and if char k ^2, then 

L^(Af) = ^] Af.2(/^')*W+Ar 2^-Mog(A^)(^)(iV) . (15) 

^ A=l ^ 
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Proof. Consider the total cost of the algorithm with respect to the computa- 
tional cost of the first step: 

L^(iV) = 2N" ■ L'-p{N') + e {N"logN" ■ {N' + M*{N'))) . (16) 

This follows from the fact that we need to perform a DFT of order N" over A n' . 
Each DFT requires 6(iV"logA^") additions of elements in An> and the same 
number of multiplications by powers of an A''"-th principal root of unity. Since 
dimfe A]\[i = N' , one addition in A^i takes N' additions in fc, and by definition, 
M*{N') is the number of operations in k, needed to computed the necessary 
products by powers of a principal root of unity. Unrolling (16) (by using (16) 
recursively C times), (14) follows. 

To obtain (15) from (14) we use the trivial lower bound M{N') > 1. We 
then notice that N' > f^{N"), therefore, we come to the equality N" = 0(1) 
not earlier than for £ = (/^ )*(A'^), by definition of these operations and the 
lower bound (15) follows. □ 

Corollary 3. 

1. For an arbitrary fast field k, we have Lx>t.(n) = 0(n log n). 

2. For an o(loglog n) -fast field k, we have Lx>^. = o(nlogriloglogn). 

3. For an fl{n^~''^^^) -slow field k, we have L-d^ = ^^(nlognloglogn). 
Proof. 

1. By definition of a fast field, it suffices to take constant number of steps (in 
fact, even one step) to extend k with a principal root of unity of a suitable 
order. This means, ^ = 1 and TV' = 0(1). Therefore, M(iV') = 0(1) and 
trivially log A''" < logA''. 

2. By definition of an o(log log n)-fast field, in the first step we have 

7V' = o(loglog7V). 
We always can bound M{N-) with N- in (14), and we have 

£ = o(log* log* n). 
Bounding the first summand in the sum in (14) by 

N ■ N' ■ logiV = o(n log n log log n), 

and each next summand by o(n • 2'°^ • log log n • log(loglogn)), we 

obtain the statement. 

3. For fk{n) = n(ni-°(i)) we have f^{n) = n{ni-°(^^) and 

(/fe^)*(n) = f^(loglogn). 
Each summand in (15) is therefore ri(log n) and the statement follows. □ 
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Corollary 4. Lx>f^{n) = O(nlognloglogn). 



Proof. We have fq{n) > 
Corollary 3. 



21ogn 



n 



^ °^^^) and the statement follows from 

□ 



Corollary 5. For the finite field ¥p, Lj)^{n) = 0(n • logn). 

Proof. Wc have fy^ (n) ^ log^ n since the multiplicative group F* is cyclic and in 
the extension field Fp^ of degree n exists a primitive root of unity of order j?" — 1. 
This means that /^^(n) ~ log^n and {f^J*{n) ~ log*n, and the statement 
follows from taking in (15) the first summand which is always O(nlogn). □ 

Note, that Theorem 2 does not give any pessimistic lower bound in case of 
finite fields. Actually, it can give a good upper bound if one can prove existence 
of order sequences of constant sparseness over finite fields. More formally. 

Corollary 6. Assume, there exists an order sequence M — — l)}i>i of 

constant sparseness over ¥p and assume that the complexity of multiplication 
by powers of a principal (p"' — l)-th root of unity in ¥pr>i can be performed in 
0{ni) time. Then L-p^ (n) = 0(n logn log* n). 



Proof From (16) we get L'^^ {N) < j^L'^^ (log^ N) + 0{N log N), and the 



There are two challenges to find a faster polynomial multiplication algorithm 
over finite fields. The first challenge is the already mentioned existence of order 
sequences of constant sparseness over these fields. This conjecture is due to 

Blfiser [5]. 

Conjecture (Blaser). There exist order sequences of constant sparseness over 
finite fields. 

In Remark 2 we showed, that indeed there exist suitable order sequences, 

however, they are too sparse for our purposes. The second challenge is the 
complexity of multiplication by powers of a primitive root of unity in extension 
fields. However, there are ways to overcome this with slight complexity increase. 
We recently obtained some progress in this area, and we think that a general 
improvement for fields of characteristic different from 2 and is possible. 

6 Conclusion 

We generalized the notion of a DFT-based algorithm for polynomial multipli- 
cation, which describes uniformly all currently known fastest algorithms for 
polynomial multiplication over arbitrary fields. We parameterized fields by in- 
troducing the notion of the degree function and order sequences and showed 
upper and lower bounds for DFT-based algorithm in terms of these paremeters. 

There is still an important open question whether one can improve the; gen- 
eral Schonhage-Strassen's upper bound. As an outcome of this paper we support 



statement follows from the solution of this inequality. 



□ 



24 



the general experience that this question is not very easy. In particular, using 
only known DFT-based techniques will unlikely help much in case of arbitrary 
fields, in particular for the case of the rational field, as they did for the com- 
plexity of integer multiplication. 
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